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Abstract 

In  this  paper  we  propose  modeling  and  analysis  techniques  for  ge¬ 
netic  networks  that  provide  biologists  with  insight  into  the  dynamics 
of  such  systems.  Central  to  our  modeling  approach  is  the  framework 
of  hybrid  systems  and  our  analysis  tools  are  derived  from  formal 
analysis  of  such  systems.  Given  a  set  of  states  characterizing  a  prop¬ 
erty  of  biological  interest  V,  we  present  the  Multi-Affine  Rectangular 
Partition  (MARP)  algorithm  for  the  construction  of  a  set  of  infeasible 
states  X  that  will  never  reach  V  and  the  Rapidly  Exploring  Random 
Forest  of  Trees  ( RRFT )  algorithm  for  the  construction  of  a  set  of 
feasible  states  J-  that  will  reach  V.  These  techniques  are  scalable  to 
high  dimensions  and  can  incorporate  uncertainty  ( partial  knowledge 
of  kinetic  parameters  and  state  uncertainty).  We  apply  these  methods 
to  understand  the  genetic  interactions  involved  in  the  phenomenon 
of  luminescence  production  in  the  marine  bacterium  V.  fischeri. 

KEY  WORDS — genetic  networks,  hybrid  systems,  formal 
analysis,  rapidly-exploring  random  trees 

1.  Introduction 

The  recent  completion  of  a  draft  of  the  human  genome  and 
the  sequencing  of  several  other  organisms  has  provided  a  vast 
amount  of  genomic  data  for  advancing  our  understanding  of 
fundamental  biological  processes.  However,  in  order  to  un¬ 
derstand  different  cellular  behaviors  such  as  differentiation, 
response  to  environmental  signals,  and  cell-to-cell  communi¬ 
cation,  we  need  to  study  the  regulatory  systems  determining 
the  expression  of  genes.  This  is  usually  a  complex  process. 
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which  can  be  regulated  at  several  stages  such  as  transcrip¬ 
tion  (the  best  studied  form  of  regulation),  translation,  and 
post-translational  modification  of  proteins.  An  example  of 
transcriptional  regulation  is  repression:  a  regulatory  molecule 
binds  to  a  regulatory  site  of  some  gene  preventing  the  ribo¬ 
nucleic  acid  (RNA)  polymerase  from  transcribing  the  gene. 
The  number  of  regulating  factors  is  usually  large,  and  it  in¬ 
volves  proteins  (products  of  other  genes  and  possibly  of  the 
gene  itself),  RNA,  and  other  molecules.  A  collection  of  in¬ 
teracting  genes,  their  products,  and  some  other  molecules 
involved  in  the  regulation  of  the  genes  form  a  “genetic  regu¬ 
latory  network”. 

The  traditional  approach  to  modeling  of  genetic  networks 
leads  to  highly  nonlinear  systems  of  differential  equations 
for  which  analytical  solutions  are  not  normally  possible.  One 
way  to  work  around  the  difficulties  of  the  nonlinearities  is  to 
use  simplified,  approximate  models.  Existing  work  focuses 
on  very  low-dimensional  genetic  networks.  Decoupled  piece- 
wise  linear  differential  equations  (PLDEs)  are  considered  in 
Glass  (1975)  and  Mestl,  Plathe,  and  Omholt  (1995),  where 
gene  regulation  is  modeled  as  a  discontinuous  step  function 
and  chemical  reactions  are  ignored.  This  (over)simplified  ap¬ 
proach  to  modeling  allows  for  interesting  qualitative  analysis 
(de  Jong  et  al.  2003).  An  even  more  radical  idealization  is  ob¬ 
tained  if  the  state  of  a  gene  is  abstracted  to  a  Boolean  variable 
and  the  interaction  among  elements  to  Boolean  functions,  as 
in  Boolean  networks  (Kauffmann  1969).  While  amenable  for 
interesting  analysis,  the  methods  mentioned  above  are  based 
on  assumptions  which  disregard  important  biochemical  phe¬ 
nomena.  Most  of  them  only  capture  protein  dynamics  but 
cannot  accommodate  chemical  reactions  (Kauffmann  1969). 

Our  modeling  approach  is  based  on  hybrid  systems  (Lynch 
and  Krogh  2000),  i.e.,  systems  in  which  discrete  events  are 
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combined  with  continuous  differential  equations.  In  rectan¬ 
gular  regions  of  the  state  space  where  the  chemical  dynamics 
can  be  reasonably  approximated  as  smooth  we  model  them 
using  deterministic,  nonlinear  (multi-affine)  ordinary  differ¬ 
ential  equations.  We  assume  that  the  chemical  concentrations 
are  spatially  homogeneous,  locally,  eliminating  the  need  to  use 
partial  differential  equations  or  rarefied  molecular  stochastic 
models.  Similar  to  the  Boolean  approach  described  above,  the 
discrete  component  of  the  model  captures  the  switching  be¬ 
havior  that  is  observed  in  phenomena  such  as  transcription, 
protein-protein  interactions,  and  cell  division  and  growth.  We 
also  propose  the  use  of  hybrid  system  as  the  natural  frame¬ 
work  for  giving  a  global  description  of  a  biological  system 
described  locally  around  operating  points  by  simpler  dynam¬ 
ics,  which  are  easier  to  approach  for  analysis.  Our  own  work 
using  hybrid  systems  to  model,  simulate  and  perform  prelim¬ 
inary  analysis  on  low-dimensional  genetic  networks  is  given 
(Alur  et  al.  2001,  2002a,  2002b;  Belta  et  al.  2001;  Belta,  Ha- 
bets,  and  Kumar  2002);  other  work  includes  Glass  (1975)  and 
de  Jong  et  al.  (2003). 

We  are  interested  in  developing  general  modeling,  sim¬ 
ulation,  and  analysis  techniques  for  metabolic  and  genetic 
networks.  Our  ultimate  goal  is  to  create  tools  enabling  us 
to  answer  biologically  significant  questions  of  the  following 
types.  “If  an  organism  is  initially  in  a  state  described  by  certain 
ranges  of  metabolite  and  enzyme  concentrations,  and  levels 
of  activation  of  genes,  describe  the  set  of  states  that  the  or¬ 
ganism  can  reach  in  T  seconds.”  Or,  “describe  the  states  with 
the  property  that  if  the  system  starts  in  any  of  them  it  will 
never  reach  an  undesired  state.”  The  undesired  state  could 
correspond,  for  example,  to  a  certain  disease.  We  denote  the 
set  associated  with  properties  of  interest  as  V.  To  answer  the 
questions  formulated  above,  we  are  interested  in  determin¬ 
ing  two  disjoint  sets  of  initial  conditions  that  can  be  associ¬ 
ated  with  the  set  V  (see  Figure  1).  First,  we  are  interested 
in  characterizing  feasible  sets  T,  consisting  of  initial  con¬ 
ditions  that  make  it  possible  for  the  system  to  enter  the  set 
V  under  some  combination  of  parameter  values  and  noise. 
We  are  also  interested  in  the  infeasible  sets  X,  from  which 
it  is  impossible  to  enter  the  set  V.  Knowledge  of  T  may  as¬ 
sist  in  experiment  design;  while  a  knowledge  of  the  set  X,  is 
particularly  useful  for  model  validation.  If  experimental  data 
indicate  the  system  enters  V  and  if  experimental  conditions 
were  known  to  lie  in  set  X,  one  could  prove  the  model  to  be 
inconsistent  with  the  experimental  data.  In  many  ways,  the  ap¬ 
proaches  to  generating  the  two  sets,  and  the  information  they 
encode,  are  complementary — neither  approach  alone  is  com¬ 
plete.  Together  however,  knowledge  of  the  sets  T  and  X  can 
enable  us  to  make  some  powerful  assertions  about  the  system’s 
behavior. 

The  connection  between  biomolecular  networks  and 
robotic  systems  exists  on  two  levels.  From  a  modeling  point 
of  view,  robotic  systems  share  many  of  the  salient  features 
of  biological  system  models  described  above.  Just  as  the  rate 


Fig.  1.  The  three  sets  of  interest:  the  property  set  V,  the 
infeasible  set  X  from  which  no  trajectory  can  enter  V,  and  the 
feasible  set  T  from  which  there  exists  at  least  one  trajectory 
which  can  enter  V . 


equations  for  biomolecular  networks  are  known  to  qualita¬ 
tively  switch  based  on  the  presence  or  absence  of  various 
inhibitor  genes,  robotic  systems  often  employ  different  con¬ 
trollers  and  estimators  in  different  regimes  (Das  et  al.  2002) 
and  their  dynamics  switch  based  on  the  contact  mechanics 
of  rolling  and  sliding.  Therefore,  both  types  systems  can  be 
modeled  as  hybrid  systems.  In  addition,  many  robotic  sys¬ 
tems  consist  of  multi-agent  teams,  and  therefore  the  interac¬ 
tions  and  messaging  among  the  team  members  must  be  taken 
into  account  and  many  of  the  system  properties  are  distributed 
spatially.  Multicell  networks  behave  in  much  the  same  way. 
Finally,  the  significant  modeling  uncertainty,  which  is  cen¬ 
tral  to  our  discussion  and  analysis  of  biological  systems,  is  a 
common  theme  in  mobile  robotics  operating  in  unstructured 
environments. 

Perhaps  a  deeper  connection  between  the  two  fields  exists 
at  the  level  of  the  types  of  problems  we  seek  to  solve.  The 
problem  of  finding  sets  of  states  T,  from  which  the  system 
may  reach  V,  is  similar  to  the  motion  planning  problem  in 
robotics  where  the  goal  is  to  find  a  trajectory  (if  one  exists) 
from  the  starting  configuration  to  the  goal  configuration.  De¬ 
termining  an  infeasible  set  X,  from  which  it  is  impossible  to 
reach  the  property  set  V  is  closely  related  to  trajectory  gen¬ 
eration,  controllability  and  steering.  As  one  can  imagine,  the 
literature  on  simulation  and  verification  of  hybrid  systems  is 
also  particularly  relevant  to  our  discussion  (Henzinger  et  al. 
1995, 2000;  Lafferriere,  Pappas,  and  Yovine  1999;  Butchkarev 
and  Tripakis  2000;  Mitchell  and  Tomlin  2000;  Asarin,  Dang, 
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and  Maler  2001 ;  Chutinam  and  Krogh  2003;  Tabuda  and  Pap¬ 
pas  2003). 

In  this  paper  we  develop  methodologies  for  finding  the  sets 
shown  in  Figure  1 .  In  Section  2,  we  introduce  the  hybrid  sys¬ 
tem  modeling  paradigm  and  provide  the  basic  definitions  that 
will  be  used  throughout  the  paper.  In  Section  3,  we  exploit  the 
particular  structure  of  the  hybrid  models  of  genetic  networks 
to  derive  a  computationally  attractive  algorithm,  the  Multi- 
Affine  Rectangular  Partitioning  (MARP)  algorithm,  to  com¬ 
pute  infeasible  sets  X  based  on  evaluating  the  vector  field  at  the 
vertices  of  the  rectangular  invariants.  In  Section  4  we  extend 
the  popular  Rapidly  Exploring  Random  Tree  (RRT)  algorithm 
from  the  motion  planning  literature  to  address  time-invariant 
uncertainty  such  as  unknown  initial  conditions  or  rate  con¬ 
stants.  The  resulting  algorithm,  called  the  Rapidly  Exploring 
Random  Forest  of  Trees  (RRFT)  algorithm,  allows  one  to  ad¬ 
dress  the  reachability  problem  probabilistically  for  complex 
high-dimensional  systems  having  both  time-varying  and  time- 
invariant  uncertainty,  providing  a  natural  way  to  determine  if 
a  set  should  be  included  in  T .  The  algorithm,  which  has  po¬ 
tential  for  parallel  implementation,  estimates  the  growth  and 
coverage  of  the  trees  and  uses  this  information  to  modify  the 
search.  The  usefulness  of  the  two  algorithms  is  illustrated 
in  Section  5,  where  we  study  the  phenomenon  of  biolumi¬ 
nescence  production  in  the  marine  bacterium  V.  fscheri  by 
analyzing  the  corresponding  genetic  network.  The  paper  con¬ 
cludes  with  final  remarks  and  directions  of  future  research  in 
Section  6. 

2.  Hybrid  System  Modeling  of  Genetic  Networks 

Hybrid  systems  are  dynamical  systems  with  both  discrete  and 
continuous  state  changes  (Lynch  and  Krogh  2000).  We  are  in¬ 
terested  in  a  special  class  of  hybrid  systems,  called  switched 
systems,  which  are  defined  as  having  different  dynamics  in 
different  non-overlapping  regions  of  the  state  space.  In  our 
view,  hybrid  and  switched  systems  are  appropriate  and  attrac¬ 
tive  for  modeling  the  dynamics  of  biomolecular  networks  for 
two  main  reasons: 

Hybrid  Systems  are  Global  Descriptions  from  Simpler 
Local  Models.  Computationally  attractive  formalisms  for 
modeling  biomolecular  networks  such  as  linearizations,  half¬ 
systems  (Savageau  and  Voit  1987),  synergistic  (S)  systems 
(Savageau  1969),  generalized  mass  action  (GMA;  Peschel  and 
Mende  1986),  and  power  law  (Heinrich  and  Schuster  1996) 
are  only  valid  locally  around  operating  points.  For  example, 
the  S-systems  can  be  thought  of  as  linearizations  of  “real” 
systems  in  logarithmic  coordinates  (Savageau  1969).  Then, 
in  this  case,  a  global  description  of  the  network  is  a  collec¬ 
tion  of  regions  with  different  polynomial  vector  fields  in  each 
region,  therefore  a  hybrid  system.  The  specific  nonlinearities 
of  dynamics  in  each  region  are  simpler  than  the  dynamics  of 
a  global  continuous  description,  and  easier  to  approach  for 
analysis. 


Hybrid  Systems  Capture  Discrete  Events.  Discrete  dynam¬ 
ics  are  necessary  to  capture  switching  behavior  that  is  ob¬ 
served  in  phenomena  such  as  transcription,  protein-protein 
interactions,  and  cell  division  and  growth.  Consider,  for  ex¬ 
ample,  the  case  when  a  metabolite  from  the  network  regulates 
the  production  of  a  metabolic  enzyme  expressed  from  a  gene 
with  a  strong  promoter.  Then,  the  gene  can  be  “on”  and  “off”, 
which  induces  two  different  dynamics  of  the  network,  as  a 
function  of  the  concentration  of  the  metabolite. 

Formally,  hybrid  systems  are  defined  as  tuples 

HS  —  (Q,  X,  X°,  I,  T,  F)  (1) 

where  Q  is  a  finite  set  of  discrete  variables,  X  is  the  set  of 
continuous  variables  x,  X  is  the  set  of  all  evaluations  of  x 
over  the  corresponding  domains,  Q  is  the  countable  set  of 
discrete  states,  called  modes,  or  locations,  X°  C  Q  x  X  is  a 
set  of  initial  states,  I  is  a  map  which  assigns  to  each  discrete 
location  in  Q  an  invariant  set,  T  C  Q  x  X  x  Q  is  a  set  of 
discrete  transitions,  and  F  :  Q  — >  (X  ->  7'X)  is  a  mapping 
that  specifies  the  continuous  (possibly  time-dependent)  flow 
in  each  discrete  state. 

We  focus  on  vector  fields  /  that  are  products  of  the  state 
components  to  capture  the  nonlinearities,  which  are  specific 
to  dynamics  of  chemical  reactions.  To  take  into  account  pos¬ 
sible  modeling  noise  and  to  accommodate  non-deterministic 
modeling  approaches,  we  also  allow  for  an  additive  noise  term 
v(t)  in  the  vector  field.  Therefore,  as  suggested  by  Kepler  and 
Elston  (2001),  the  vector  field  associated  with  the  map  F  takes 
the  form  F  —  fix)  +  v(f).  The  form  of  /  is  made  precise  in 
the  next  subsection,  while  v  is  discussed  in  Section  4. 

We  are  also  interested  in  constant  parameters  that  might 
be  unknown.  Examples  of  these  parameters  are  binding  con¬ 
stants  and  other  constants  determining  reaction  rate  kinetics. 
While  these  constants  are  known  to  lie  within  a  certain  range, 
their  exact  values  are  often  unknown.  It  is  useful  to  note  that 
these  unknown  constants  can  be  viewed  as  state  variables  with 
trivial  dynamics,  e.g.,  x  =  c,  x  =  0,  while  the  corresponding 
projection  of  X°  characterizes  the  known  bounds.  Thus,  our 
definition  of  HS  in  eq.  (1)  allows  for  unknown  parameters 
that  lie  in  some  specified  set. 

Finally,  since  molecular  networks  are  qualitatively  de¬ 
scribed  in  terms  of  ranges  of  concentrations  of  the  involved 
species,  a  rectangular  partition  of  the  state  space  is  naturally 
induced  and  the  invariants  are  rectangular. 

2.1.  Rectangular  Multi-Affine  Hybrid  Systems 

Rectangular  multi-affine  hybrid  systems  are  characterized  by 
rectangular  invariants  and  multi-affine  continuous  dynamics. 
A  function  with  a  vector  argument  is  called  multi-affine  if  it  is 
affine  in  each  component  of  its  argument,  i.e.,  when  all  other 
components  are  kept  constant.  More  formally,  we  define  a 
multi-affine  function  as  follows. 
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Definition  1 .  [Multi-affine  function]  A  multi-affine  func¬ 
tion  /  :  Rw  — >  Rw  is  a  polynomial  in  the  indeterminates 
xu  . . .  .  xN  with  the  property  that  the  degree  of  /  in  any  of 
the  indeterminates  .y, , . . .  ,  xN  is  less  than  or  equal  to  1 .  Stated 
differently,  /  has  the  form 

fix i - ,xN)  =  ^2  ch iNx\'  ■  ■  -x'jj,  (2) 

with  c.j  iN  e  Rw  for  all  ,iN  e  [0,  1}  and  using  the 

convention  that  if  ik  =  0,  then  xk  =  1 . 

An  /V-dimensional  rectangle  in  E,v  is  characterized  by  two 
vectors  a  =  (ax, . . .  ,  aN )  e  Rw  and  b  =  (b1, . . .  ,  bN)  e  R^, 
with  the  property  that  a,  <  b,  for  i  —  1 . A: 

Rm(a,  b)  =  {.r  =  (xi, . . .  ,  xN)  e  |  i  =  1, . . .  ,  A  : 

at  <  Xi  <  bj}.  (3) 

The  variables  xiti  =  1 . A  are  species  concentrations  and 

are  restricted  to  the  positive  quadrant.  Also,  there  are  practical 
upper  bounds  on  the  concentration  of  each  species.  Therefore, 
the  set  X  as  in  eq.  (1)  is  usually  specified  as  an  /V-rectangle. 
A  rectangular  partition  of  X  is  defined  as  follows.  Each  axis 
Oxt,  i  —  1,  ....  A  is  divided  into  n,  >  1  intervals  by  the 
thresholds  0  =  6°  <  0/  <  ...  <  9"‘ .  The  /th  interval  on 
the  (9 .Y, -axis,  i  =  1,  . . .  ,  A  is  therefore  defined  as  6-  < 

Xi  <  6- ,  j  =  1,  By  convention,  6°  =  0  and  9"‘  is 

an  upper  bound  giving  a  physical  limit  of  x, .  The  thresholds 
6  are  defined  as  values  of  species  concentrations  for  each 
the  dynamics  of  the  overall  system  changes.  For  example, 
they  can  be  concentrations  of  regulatory  species  for  which 
specific  genes  are  turned  “on”  and  “off”.  The  division  of  the 
axes  determines  a  partition  of  the  state  space  into  ]~[/li  n< 
rectangles.  If  we  let 

aiq  i-«w)  =  ( 6 1'~\  ...  ,  9qNN~l)  e  R",  biq'-qN) 

=  ,6™)  el»,  (4) 

for  qi  =  1,  . . .  ,  rii,  i  =  1,  . . .  ,  A,  then  an  arbitrary  rectangle 
in  the  partition  is  given  by 

RN{a(q'-qN\  b(qi-qN))  =  {(xu  ...  ,xN)e  RiV|6»fi_1  <  x,  <  6qt , 

i  =  1 . JV}.  (5) 

Due  to  the  different  levels  of  gene  transcription  activation 
and  enzymatic  action,  in  each  of  the  rectangles  the  system 
evolves  along  specific  multi-affine  vector  fields  (2): 

fiqi  -qN) (xu  ■ . . , xN)  =  J2  (6) 

>i . >wef0,l) 

where  x  e  RN(a(qi  -qN\  b{qi  ■qN))  and  e  Rw  for  all 

,  iN  €  [0,  1}  captures  specific  reaction  rates. 


Therefore,  our  models  of  biomolecular  networks  are  hy¬ 
brid  systems  (1)  with  the  set  of  labels  for  the  discrete 
states  Q  =  (q{...qN),  the  set  of  all  Y\H=ini  m°des 

Q  =  {(<7i  •  •  qi  =  1 . n,,  i  =  1, . . .  ,  A},  X 

is  the  set  of  species  symbols  Xi, . . .  ,xN,  the  invariant 
I(qi . . .  qN)  is  the  corresponding  rectangle  (5),  and  the  map 
F  has  a  deterministic  part  described  by  eq.  (6).  A  transition 
((qt  . . .  qN),  x,  (q J  . . .  q'N ))  corresponds  to  the  crossing  of  the 
boundary  between  rectangles  I(qi . . .  qN)  and  1  (q[ . . .  q'N)  at 
state  x. 

Remark  1.  Due  to  the  particular  shape  of  the  invariants,  a 
convenient  way  of  representing  a  rectangular  multi-affine  hy¬ 
brid  system  (6),  (5)  is  as  a  simple  graph  with  n,  nodes. 
Node  (q{  . . .  qN)  corresponds  to  rectangle  I  (qt  . . .  qN)  = 
RN(a(qi "qN\  b(qt  ■‘in))  and  has  associated  dynamics  (6).  An 
edge  in  the  graph  connects  nodes  corresponding  to  adjacent 
rectangles,  i.e.,  there  is  an  edge  between  any  pair  of  nodes 
that  differ  by  a  Hamming  distance  of  1.  See  Figure  7  for  an 
example  with  A  =  3,  rii  =  n2  =  n3  =  3. 

2.2.  Set  Definitions 

As  stated  before,  we  are  interested  in  characterizing  the  prop¬ 
erties  of  the  system  HS  related  to  whether  it  can  reach  a 
given  set  of  interest  V  described  by  a  rectangle  I  (/?,  . . .  pN), 
(Pi  ■  ■  ■  Pn )  ^  Q- 

Definition  2.  [Infeasible  set]  An  infeasible  set  2  is  a  col¬ 
lection  of  rectangles  I(qt  . . .  qN),  (qi  . . .  qN)  e  Q  with  the 
property  that  the  system  can  never  reach  I(p\  . . .  pN)  if  it 
starts  in  any  of  the  initial  states  contained  in  any  of  rectangles 
in  2. 

Definition  3.  [Feasible  set]  A  feasible  set  T  is  a  collection 
of  rectangles  l(qx . . .  qN),  (qx  . . .  qN)  e  Q  with  the  property 
that  they  contain  initial  states  that  will  reach  I(p{  . . .  pN)  in 
a  pre-specified  finite  time  interval. 

3.  Computing  An  Infeasible  Set  1 

In  this  section  we  introduce  the  MARP  algorithm,  which  uses 
the  properties  of  hybrid  multi-affine  rectangular  systems  to 
construct  infeasible  sets  2 ,  and  extends  some  results  presented 
in  Belta,  Habets,  and  Kumar  (2002).  In  the  form  presented  in 
this  paper,  the  algorithm  can  only  be  applied  to  determin¬ 
istic  vector  fields  where  F  =  f .  However,  as  explained  in 
Section  1,  it  can  easily  accommodate  rectangular  parametric 
uncertainties  since  set-valued  uncertainty  in  a  constant  pa¬ 
rameter  can  be  included  in  the  set  of  initial  conditions,  X°. 

3.1.  Preliminaries 

The  MARP  algorithm  is  based  on  the  fact  that  the  value  of  a 
multi-affine  function  (6)  is  uniquely  determined  everywhere 
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in  the  rectangular  invariant  /  (q,  . . .  qN)  by  its  values  at  the 
vertices.  Moreover,  it  is  a  convex  combination  of  these  values. 
Formally,  for  an  arbitrary  rectangle  (3),  let 

N 

VN(a,  b)  =  bi)  (7) 

i= 1 

denote  the  set  of  its  2N  vertices.  Let  §  :  {at, . . .  ,aN,bu...  , 
bN]  — »  {0,  1}  be  defined  by 

£(«,)  =  0,  $(&*)=!,  k=  1 . N.  (8) 


Proposition  1.  A  multi-affine  function  /  :  RN(a,b)  — > 
JR'''  is  a  convex  combination  of  its  values  f(vi ,  . . .  ,  vN)  at  the 
vertices  VN(a ,  b).  Explicitly, 

f«' . e  n  te)  im 

(vi,...  ,VN)eVh r(a,b)  k=l 

f(v i,...  ,  1%), 

(9) 


with 


1  = 


Y\(Xk~  Q*  y("t}  ^  hk  ~  \ 


l-ttvk) 


(vi ,VN)eVflj(a,b)  fc=l 


GO) 


where  (t>1; . . .  ,  vN )  e  VN(a ,  fr). 

The  proof  of  this  proposition  can  be  found  in  Belta,  Ha- 
bets,  and  Kumar  (2002).  Since  the  projection  of  a  multi-affine 
vector  field  along  a  given  direction  is  a  multi-affine  function, 
an  immediate  consequence  of  Proposition  1  can  be  used  to  de¬ 
velop  a  computationally  efficient  algorithm  for  constructing 
infeasible  sets  for  rectangular  multi-affine  hybrid  systems,  as 
follows. 

Corollary  1 .  The  projection  of  a  multi-affine  vector  field 
defined  on  a  rectangle  along  a  given  direction  is  positive  (neg¬ 
ative)  everywhere  in  the  rectangle  if  and  only  if  its  projection 
along  that  direction  is  positive  (negative)  at  the  vertices. 

3.2.  MARP  algorithm 

We  assume  that  the  piecewise  defined  vector  field  (6)  (possibly 
non-differentiable)  is  continuous  everywhere,  i.e.,  the  vector 
fields  in  adjacent  rectangles  coincide  on  the  common  facet.  A 
simple  consequence  of  Corollary  1  can  be  used  to  qualitatively 
analyze  the  system. 

Corollary  1  is  applied  to  the  facets  of  the  A'-rcctangles  (5) 
and  to  the  projections  of  the  vector  fields  along  the  corre¬ 
sponding  outer  normals.  Each  facet  is  an  (N  —  l)-rectangle. 
An  infeasible  set  X  can  be  built  by  defining  an  orientation 
for  the  simple  graph  of  the  network  defined  in  Remark  1. 


We  allow  for  both  unidirectional  and  bidirectional  edges  in 
the  oriented  graph.  The  semantics  of  the  orientation  are  de¬ 
fined  as  follows.  Let  (</,  . . .  qN )  and  (q[  . . .  q'N)  be  two  adjacent 
nodes  in  the  graph  and  I(q{  . . .  qN)  and  I(q[ . . .  q'N)  the  cor¬ 
responding  adjacent  rectangles.  A  unidirectional  edge  from 
(qt  . .  ,qN)to(q[ . . .  q'N)  means  that  there  exists  at  least  one  tra¬ 
jectory  originating  in  I(qi . .  ,qN)  that  enters  into  I  (q\ . . .  q'N) 
through  the  separating  facet,  and  there  is  no  trajectory  start¬ 
ing  in  l  (q\ . . .  q'N )  going  to  I  (q{ . . .  qN )  through  that  facet.  A 
bi-directional  edge  ensures  the  existence  of  at  least  one  tra¬ 
jectory  originating  in  I(qt  . . .  qN )  entering  into  I  (q[ . . .  q’N) 
and  at  least  one  trajectory  originating  in  l  (q\ . . .  q'N)  entering 
into  /  (<?!  . . .  qN). 


Algorithm  1.  Define  an  oriented  graph 

for  each  node  (q{  . . .  qN),  qt  =  1,  . . .  ,  i=  1,  . . .  .  N  do 
for  each  incident  edge  do 

for  each  vertex  of  the  corresponding  facet  do 
calculate  the  projection  of  along  the  outer 

normal  of  the  facet 
end  for 

if  the  projections  are  positive  at  all  vertices  then 
the  edge  is  unidirectional  oriented  out  of 
(<7i  ■  ■  -<?jv) 

end  if 

if  the  projections  are  negative  at  all  vertices  then 
the  edge  is  unidirectional  oriented  towards 

(<7i  ■  ■  -<?jv) 

end  if 

if  there  is  a  sign  change  among  the  projections  at  the 
vertices  then 

the  edge  is  bidirectional 

end  if 
end  for 

end  for 

Note  that,  in  the  oversimplified  description  above.  Algo¬ 
rithm  1  seems  inefficient.  Indeed,  if  we  apply  it  to  all  the  rect¬ 
angles  in  the  partition,  most  of  the  vertices  are  visited  more 
than  once,  and  such  multiple  evaluation  at  vertices  is  avoidable 
because  the  vector  fields  in  adjacent  triangles  match  on  the 
separating  facet.  A  more  efficient  description  would  require 
more  complicated  notation  and  a  more  detailed  discussion 
which  we  omit  because  it  is  peripheral  to  the  main  ideas  in 
the  paper. 

Using  the  oriented  graph,  we  can  now  construct  an  infea¬ 
sible  set  X.  Let  V  —  I(p\ . . .  pn)  denote  the  target  rectangle, 
or,  equivalently,  (p,  . . .  pN)  is  the  target  node  in  the  graph. 
The  following  algorithm  constructs  a  set  1Z  of  nodes  with  the 
property  that  if  the  system  starts  in  any  of  the  corresponding 
rectangles,  then  it  may  be  possible  to  reach  V.  The  comple¬ 
ment  of  this  set  is  an  infeasible  set  X. 

Algorithm  2.  Construct  an  infeasible  set  X 

initialize  1Z  with  V 
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repeat 

for  each  element  (q{ ...  qN)  of  1Z 

for  all  incident  nodes  {q [ . . .  q'N)  connected  with  an 
edge  (uni  or  bi-directional)  oriented  towards 
(q<  ... qN )  do 

if  (q[  . . .  q'N)  is  not  already  in  R.  then 
add  (q[ . . .  q'N)  to  1Z 

end  if 
end  for 
end  for 

until  cardinality  of  1Z  increases 

X  :=  complement  of  R.  with  respect  to  the  set  Q  of  all 
nodes 

Algorithm  2  for  the  construction  of  the  infeasible  set  X 
might  be  too  conservative,  i.e.,  the  set  X  might  be  unneces¬ 
sarily  small.  Indeed,  our  method  guarantees  the  existence  of 
a  trajectory  from  a  rectangle  I  (q,  . . .  qN)  to  an  adjacent  rect¬ 
angle  I  (q[ . . .  q’N)  if  the  (unidirectional  or  bidirectional)  edge 
between  the  corresponding  nodes  in  the  oriented  graph  has 
an  arrow  from  (q i . . .  qN)  to  (q[  . . .  q'N).  However,  if  there  is 
an  edge  from  (q i . . .  qN )  to  (q[ . . .  q'N)  and  also  an  edge  from 
(q [ . . .  q'N)  to  (q"  . . .  q jj),  we  cannot  guarantee  that  there  is  a 
trajectory  of  the  system  from  I  (q i . . .  qN)  to  I(q"  . . .  q1^).  In 
our  analysis,  we  simply  say  that  there  might  be  a  trajectory 
from  I(qt  ...qN)  to  /  (q"  . . .  <?"). 

This  “conservativeness”  is  the  main  issue  in  discrete  ab¬ 
stractions,  where  the  central  problem  is  to  determine  partitions 
of  continuous  or  hybrid  systems  such  that  the  discrete  quo¬ 
tient  determined  by  the  partition  is  equivalent  with  the  initial 
system  with  respect  to  reachability  properties.  Intuitively,  it  is 
easy  to  see  that  this  problem  is  solved  if  and  only  if  all  trajec¬ 
tories  in  a  given  region  reach  exactly  one  neighboring  region. 
In  this  case,  the  initial  continuous  or  hybrid  system  is  called 
decidable  and  the  discrete  quotient  induced  by  the  partition  is 
said  to  be  bi-similar  (Park  1980;  Pappas  2003)  with  the  initial 
system.  Finding  classes  of  decidable  systems  is  a  very  hard 
problem  that  received  much  attention  lately  (Henzinger  et  al. 
1995).  In  this  context.  Algorithm  1  produces  a  “sufficient” 
abstraction  (Alur,  Dang,  and  Ivancic  2002c),  that  can  be  used 
to  “conservatively”  construct  infeasible  sets. 

Remark  2.  The  above  algorithm  can  be  easily  extended  to 
construct  infeasible  sets  X  under  rectangular  parameter  uncer¬ 
tainties.  This  is  possible  because  the  parameters  c  capturing 
kinetic  constants  enter  the  vector  fields  (6)  in  the  same  way 
as  the  variables  x ,  so  the  system  (1)  defined  on  an  extended 
space  formed  by  species  concentrations  and  parameters  is  still 
characterized  by  multi-affine  vector  fields.  The  components 
of  the  vector  fields  corresponding  to  parameters  will  be  zero, 
meaning  that  the  kinetic  constants  are  assumed  constant  but 
unknown  within  given  ranges. 

Remark  3.  Algorithm  1  requires  the  vector  field  /  to  be 
evaluated  at  each  vertex.  If  there  are  Nr  rectangular  sets  in  the 
partition,  the  number  of  evaluations  is  Nrx  2N.  If  each  coordi¬ 


nate  is  divided  into  K  intervals,  Nr  =  K ' .  Thus,  the  number 
of  computations  scales  as  (2 K)N .  While  the  complexity  of 
this  algorithm  is  exponential  in  the  number  of  dimension,  the 
alternative,  which  is  exhaustive  simulation,  is  impractical. 

4.  Determining  The  Feasible  Set  T 

In  this  section  we  briefly  describe  the  RRFT  algorithm,  a  ran¬ 
domized  algorithm  that  can  be  used  to  delineate  a  feasible  set 
T — a  collection  of  rectangles  I(qu  . . . ,  qN),  (qt,  ... ,  qN )  e 
Q  which  contain  initial  conditions  that  can  reach  V.  Our  al¬ 
gorithm  leverages  the  work  in  randomized  motion  planning 
pioneered  by  the  robotics  community.  We  briefly  review  this 
work  before  introducing  our  algorithm. 

4.1.  Motion  Planning 

Probabilistic  Road  Maps  (PRMs)  can  be  used  to  solve  the 
motion  problem  (Amato  and  Wu  1996;  Kavraki  et  al.  1996), 
which  involves  finding  a  path  from  a  starting  point  to  a  goal 
point  in  configuration  space.  The  problem  is  usually  cast  in 
a  geometric  setting  with  no  kinematics  or  dynamics.  In  con¬ 
trast,  RRTs  (La Valle  and  Kuffner  2001a)  generate  random 
states  for  dynamic  systems  directly  by  working  in  the  space 
of  admissible  input  functions  u{t)  e  U.  The  algorithm  (see 
Figures  2  and  3)  constructs  a  tree  Txo  rooted  at  initial  state  x°, 
whose  vertices  are  states  x  e  X  and  whose  edges  are  inputs 
u(t)  e  U.  which  cause  the  system  to  evolve  from  one  ver¬ 
tex  to  a  connected  vertex.  The  algorithm  constructs  the  tree 
beginning  with  a  user-supplied  initial  state,  which  we  refer 
to  as  the  seed  value.  A  sample  state  is  generated  at  random, 
xrand  e  X.  It  is  then  determined  which  of  the  existing  states, 
xnear  e  jn  the  tree  ^g  closest  to  the  new  state,  and  which 
a new(t)  e  U,  when  applied  for  predetermined  time  interval 
At.  would  bring  the  system  as  close  as  possible  to  xrand .  The 
resulting  new  state  x"ew  is  added  as  a  vertex  to  Xxa  with  umw  ( t ) 
the  input  characterizing  the  edge  from  xnear  to  xnew .  This  pro¬ 
cedure  has  the  effect  of  growing  a  tree  whose  distribution 
of  vertices  approaches  that  of  the  random  distribution  which 
was  used  to  create  xrand,  causing  it  to  cover  the  state  space 
rather  rapidly.  A  survey  of  the  algorithm’s  properties  appears 
in  La  Valle  and  Kuffner  (2001b). 

Of  course,  the  application  of  such  algorithms  requires  the 
ability  to  simulate  system  dynamics.  For  hybrid  systems,  this 
requires  a  special  set  of  algorithms  (e.g.  Park  and  Barton  1996; 
Esposito,  Pappas,  and  Kumar  2001a)  to  properly  address  the 
non-smooth  nature,  and  integration  algorithms  capable  of  han¬ 
dling  system  evolving  at  disparate  time-scales  (e.g.,  Esposito 
and  Kumar  2001;  Esposito,  Pappas,  and  Kumar  2001b). 

4.2.  The  RRFT  Algorithm 

Unlike  motion  planning  in  which  the  end  goal  is  to  phys¬ 
ically  steer  the  system,  our  intention  is  merely  to  deter- 
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Fig.  2.  Growth  of  individual  trees  in  the  RRT  algorithm  (LaValle  and  Kuffner  2001a).  Each  tree  consists  of  vertices  which  are 
states  x,  and  edges  which  are  input  functions  u(t)  e  U.  First,  a  new  state  is  generated  at  random,  xrand  (left).  The  algorithm 
then  determines  the  closest  state,  xnear  in  the  tree  to  the  random  state  (right). 


Fig.  3.  Growth  of  individual  trees  in  the  RRT  algorithm  (continued  from  Figure  2).  After  finding  the  closest  node,  the 
algorithm  determines  which  u(t)  e  U  brings  xnear  closest  to  xrand  (left).  unew(t)  is  applied  for  a  predetermined  duration  At 
and  the  new  state  xnew  and  umw  are  added  to  the  tree  (right). 


mine  if  it  is  possible  for  the  system  to  reach  V  from  some 
x°  e  I (qi,  . . . ,  qN)  C  X°  within  a  finite  time-span  t  e 
[f0,  tf].  If  so,  the  set  / (<y, ,  ...  qN)  is  added  to  T .  All  possi¬ 
ble  I(qlt  ...  qN)  in  X°  are  tested.  In  this  way,  our  usage  of 
the  RRT  method  for  analysis  rather  than  synthesis  (Karatas 
and  Bullo  2001;  Frazzoli,  Dahleh,  and  Feron  2002;  Kim  and 
Ostrowski  2003)  is  closely  related  to  our  work  on  test  gen¬ 
eration  for  hybrid  systems  (Kim,  Keller,  and  Kumar  2003). 
While  the  RRT  algorithm  is  in  many  ways  suited  to  applica¬ 
tions  such  as  ours,  where  the  system  dynamics  are  complex 
and  high-dimensional,  the  RRT  only  addresses  time-varying 
inputs  such  as  u(t).  Recall  that  the  evolution  of  our  hybrid 
system  is  characterized  by  two  elements: 

•  the  initial  condition  x°  e  X°  for  the  evolution  of  the 
state; 

•  The  exogenous  modeling  noise  v(t)  e  A f  that  “steers” 
the  system. 

In  our  algorithm,  the  repeated  application  of  the  RRT  algo¬ 
rithm  results  in  a  tree  for  every  choice  of  initial  condition  x". 


Accordingly,  we  need  to  consider  a  set  of  trees  that  rapidly 
explore  the  state  space. 

One  key  component  of  this  approach  is  that  each  RRT  can 
be  computed  in  parallel  on  a  different  CPUs;  therefore,  we 
assume  a  fixed  computational  resource  that  will  dictate  the 
number  of  trees  that  can  be  simultaneously  computed  in  par¬ 
allel.  Fet  this  number  be  n,.  We  propose  the  RRFT  algorithm 
as  follows.  For  each  set  I(qIt ... ,  qs )  in  X°,  a  set  of  seed 
values  S  =  {j1,  . . .  s"'}  is  generated  from  a  quasi-random  se¬ 
quence,  where  each  s'  e  I(qlt qN).  RRTs,  Ts\, . . . Ts* ,, 
are  planted  for  each  of  these  “seed”  values.  As  the  RRT  algo¬ 
rithm  progresses,  we  monitor  the  progress  of  each  tree.  If,  at 
any  point,  the  growth  of  one  of  the  trees  (as  measured  by  a 
function  g(Tst))  drops  below  a  threshold  g,  or  the  coverage  of 
the  state  space  (as  measured  by  some  function  c(Tsi))  drops 
below  a  threshold  c,  the  tree  is  terminated.  Provided  the  set 
I(qi, ... ,  qN)  is  not  adequately  covered  (again  as  measured 
by  some  function  n(S))  a  new  “seed”  is  planted  and  a  new 
tree  is  initiated.  The  process  of  planting  and  growing  new 
trees  continues  until  a  trajectory  linking  I  (q . qN)  and  V 
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is  discovered  (in  which  case  l  (q} , ...  qN)  is  added  to  T),  or 
until  I(qi,  . . .  q N)  is  sufficiently  covered  (fi(S)  <  jl)  with 
trees  that  have  stopped  growing.  A  new  set  I(q[,  . . . ,  q'N )  is 
selected  and  the  process  is  repeated  until  all  of  X°  has  been 
tested. 

We  defer  the  discussion  of  how  to  compute  the  func¬ 
tions  g(Tsi),  c(Tsi),  and  n(S)  until  Section  4.3.  Note  that  in 
the  description  of  the  algorithm  below,  we  use  the  notation 
x°  +  f  HS(v(t))  df  to  denote  the  simulation  of  the  hybrid 
system  HS ,  characterizing  the  biomolecular  network,  over 
an  interval  time  At,  using  the  disturbance  function  v(t),  and 
initial  condition  .r°. 


Algorithm  3.  Construct  a  feasible  set  T. 

for  each  /(</,,  . . . ,  qN )  in  X°  do 

Generate  initial  seed  set  S  —  {U,  . . . ,  5"'}  where  s'  e 

I(qu  ■  ■  ■ ,  qs) 

for  i  —  1,  ...  n,  do 
Initialize  RRT  7j; 
end  for 
while  (true)  do 

for  i  —  1, ...  n,  do 
Extend(7ji) 
if  %i  n  V  f=  /)  then 
add  I(qi, . . . ,  qN)  to  T 
break 
else 

if  g(Tsi)  <  g,  OR,c(7j,)  <  c  then 
terminate  Tst 
n,  <—  n,  —  1 
if  /i(S)  >  jl  then 

generate  new  seed  point  snew  and  append  to  S 
initialize  Tsnew 
n,  <—  n,  +  1 

end  if 
end  if 
end  if 
end  for 
if  n,  =  0  then 
break 
end  if 
end  while 
end  for 


Algorithm  4.  Extend(T). 

xrand  4-  random() 

x"ear  4-  nearestNeighbor(T,  xrand) 

vnew  =  argmin vsM{dist((xrani ,  xnear  +  fA>  HS(v(t))dt )} 

xnew  =  xnear  +  H  S  (v"eW  (t))d  t 

add  vertex  xnew  to  T 

add  edge  vnew,  from  xnear  to  xnew ,  to  T 


4.3.  Adequacy  Criteria 

Theoretical  results  on  RRTs  from  the  motion  planning  litera¬ 
ture  (La Valle  and  Kuffner  2001b)  suggest  that  as  the  number 


of  nodes  in  the  tree  goes  to  infinity,  the  tree  should  cover  the 
entire  reachable  set,  although  it  is  impossible  to  determine  the 
reachable  set  in  advance.  However,  because  the  input  function 
space  must  be  discretized  and  because  the  algorithm  is  inher¬ 
ently  greedy,  it  is  possible  for  the  tree  to  create  new  nodes  that 
are  very  close  to  the  existing  nodes.  Therefore,  there  are  two 
plausible  reasons  to  stop  growing  a  tree  Tsi :  (1)  the  state  space 
is  sufficiently  covered  that  one  can  be  confident  no  trajectory 
exists  linking  I  (qt ,  . . . ,  qN )  and  V\  or  (2)  the  tree  is  no  longer 
actively  growing. 

In  order  to  determine  how  to  allocate  our  computational  re¬ 
sources  effectively  we  must  monitor  the  progress  of  each  tree. 
In  particular,  we  are  interested  in  three  measures:  the  coverage 
of  the  state  space  by  an  individual  RRT  c(Tsi ) ;  the  growth  rate 
of  an  individual  RRT,  g(Tsi);  and  coverage  of  /  (ql , . . . ,  qN) 
by  the  set  of  seeds  S,  fiiS).  We  explored  different  measures 
of  growth  and  coverage  including  the  discrepancy  and  disper¬ 
sion  (Branicky  et  al.  2001),  the  size  of  the  Voronoi  regions 
(La Valle  and  Kuffner  2001b),  as  well  as  the  volume  of  the 
convex  hull  of  the  tree  nodes  and  other  bounding  polygons. 
However,  we  found  these  measures  to  be  either  too  expen¬ 
sive  to  compute  in  high  dimensions  or  overly  conservative. 
Instead,  we  begin  by  overlaying  a  grid  of  ng  points  and  spac¬ 
ing  5  on  the  state  space.  Note  that  this  grid  is  not  used  to 
construct  the  tree,  merely  to  assess  its  coverage  and  growth. 
We  calculate  the  minimum  distance  from  each  grid  point  to 
the  set  of  nodes  in  the  tree,  dj.  The  quantity  minjdj,  S)  may 
be  thought  of  as  the  radius  of  the  largest  ball  centered  at  each 
grid  point  which  does  not  contain  a  tree  node  or  another  grid 
point  (see  Figure  4).  Clearly,  the  maximum  value  of  the  ra¬ 
dius  is  5,  the  spacing  between  adjacent  grid  points.  It  should 
be  stressed  that  this  list  of  distances  can  be  updated  incremen¬ 
tally  as  new  tree  nodes  are  added,  since  the  effect  of  each  new 
node  is  local.  We  define  the  coverage  of  the  tree  Tsi,  c(Tsi), 
as  the  average  of  all  the  distances  obtained  in  this  manner, 
normalized  by  the  grid  spacing: 


,(7- }  =  ly'  min(^-^) 

A  ‘  J  n 


7=1 


(11) 


Here,  ng  is  the  number  of  grid  points,  and  dj  is  the  radius 
of  the  largest  ball  centered  at  each  grid  point.  Clearly,  this 
measure  is  a  monotonically  decreasing  function.  If  it  goes  to 
zero  on  a  given  grid,  it  tells  us  that  any  set  whose  distance 
along  its  smallest  dimension  is  greater  than  the  grid  spacing 
has  been  entered.  Said  another  way,  the  state  space  is  covered 
up  to  a  resolution  equal  to  the  grid  spacing.  This  measure 
is  similar  to  an  approximation  of  dispersion  (Branicky  et  al. 
2001),  but  less  conservative  and  faster  to  compute.  Overall 
one  of  the  advantages  of  this  measure  is  that  the  grid  size  can 
be  as  fine  or  coarse  as  one  chooses.  Finer  grids  will  require 
more  distance  queries  but  are  more  accurate  indications  of 
coverage. 

The  derivative  of  c( %)  indicates  the  growth  of  the  tree. 
Therefore 
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Fig.  4.  A  grid  is  superimposed  on  the  state  space.  The 
shaded  regions  indicate  unreachable  sets.  The  average  of  the 
distances  from  the  grid  points  to  the  closest  nodes  (shown  as 
dashed  arrows)  should  converge  to  a  finite  number  as  the  tree 
fills  the  reachable  space. 


g(Tsi)  =  -Ac(Tsi)/AnVti,  (12) 

where  nv  i  is  number  of  vertices  in  tree  rooted  at  .v'. 

Regarding  the  coverage  of  the  set  l(qu  ... ,  qN)  by  S,  one 
appropriate  measure,  which  first  appeared  in  the  Monte  Carlo 
literature  and  later  has  been  used  in  the  context  of  RRTs  (Bran- 
icky  et  al.  2001),  is  dispersion.  Dispersion  is  the  radius  of  the 
largest  empty  ball  whose  center  lies  in  I(qlt qN )  which 
does  not  include  a  point  in  S.  It  is  a  measure  of  the  largest 
region  which  is  not  covered.  Therefore,  we  use  normalized 
dispersion  as  a  criteria  for  coverage  of  I(qlt . ,  qN ) : 

li(S)=  sup  mind(x,x)/n(S)max.  (13) 

«/(?! . qs)  xeX 

Here,  A  is  a  set  of  the  nodes  in  I{qx, . . . ,  qN)  and  tx{S)max 
is  the  largest  possible  dispersion  for  a  given  space.  From  a 
computational  point  of  view,  it  is  impractical  to  compute  dis¬ 
persion  in  high  dimensions.  Fortunately,  there  exist  sequences 
of  quasi-random  numbers  which  have  low  dispersion.  Ac¬ 
cordingly,  we  use  Halton  (1960)  sequence  to  generate  the 
initial  seed  values.  However,  we  cannot  use  a  deterministic 
sequence  to  plant  a  new  seed  due  to  the  existing  nodes  gen¬ 
erated  in  I(qi, ... ,  q\i)  as  trees  grow.  The  grid-based  method 
introduced  above  can  be  used  to  approximate  the  normalized 
dispersion.  We  overlay  grid  points  in  I  (qlt ... ,  qN)  and  mea¬ 
sure  the  distance  between  each  grid  point  and  existing  nodes 
in  the  set  to  find  the  radius  of  the  largest  empty  ball  whose 
center  is  one  of  the  grid  points.  A  new  seed  is  planted  at  the 
center  of  the  largest  empty  ball. 


5.  Case  Study:  Luminescence  Control  in 
Vibrio  fischeri 

In  this  section,  we  show  how  the  theory  and  algorithms  de¬ 
veloped  in  this  paper  can  be  applied  to  study  the  behavior 
of  a  specific  genetic  regulatory  network.  We  consider  for  il¬ 
lustration  the  phenomenon  of  bioluminescence  production  in 
the  marine  bacterium  V  fischeri ,  which  is  controlled  by  the 
transcriptional  activation  of  the  lux  genes  (James  et  al.  2000; 
Belta  et  al.  2001).  The  lux  regulon  is  organized  in  two  tran¬ 
scriptional  units,  0L  and  0R,  separated  by  a  regulatory  region 
called  the  lux  box,  as  shown  in  Figure  5.  The  leftward  operon, 
0L,  contains  the1 *  luxR  gene  encoding  protein  LuxR,  a  tran¬ 
scriptional  regulator  of  the  system.  The  rightward  operon  0R 
consists  of  seven  genes  luxICDABEG.  The  expression  of  the 
luxl  gene  results  in  the  production  of  protein  LuxI,  which 
is  required  for  endogenous  production  of  autoinducer ,  Ai,  a 
small  membrane-permeant  signal  molecule.  The  other  genes 
in  0R  are  involved  in  the  production  of  luminescence.  Finally, 
the  autoinducer  Ai  binds  to  protein  LuxR  to  form  a  complex 
C,  which  has  an  electronic  affinity  to  the  lux  box.  The  tran¬ 
scription  of  both  luxICDABEG  and  luxR  is  activated  by  the 
binding  of  C  to  the  lux  box,  which  is  modeled  using  a  contin¬ 
uous  piecewise  linear  activation  function  (see  Figure  6). 

A  nine -dimensional  model  for  this  network  is  presented  in 
Belta  et  al.  (2001).  For  illustrative  purposes,  we  consider  a 
simplification  that  is  possible  under  the  assumption  that  the 
dynamics  of  protein  Luxl  are  fast  (James  et  al.  2000).  With 
this  simplification,  the  system  becomes  three-dimensional 
( N  =  3)  with  state  x  =  [xjx^]7,  where  xt,  x2,  and  x3 
represent  the  concentrations  of  protein  LuxR,  complex  C, 
and  autonducer  Ai,  respectively.  The  main  reason  for  choos¬ 
ing  this  simplified  model  is  because  the  reduction  in  di¬ 
mensionality  allows  us  to  include  three-dimensional  trajec¬ 
tories  and  reachability  graphs,  which  would  not  be  possible 
in  higher-dimensional  space.  Examples  of  analysis  in  higher¬ 
dimensional  space  are  presented  in  James  et  al.  (2000)  and 
Belta  et  al.  (2004). 

Two  additive  exogeneous  inputs  v  —  [u,  v2]T  ( m  —  2)  are 
present  in  the  model.  In  the  presence  of  a  plasmid  that  pro¬ 
duces  LuxR  independently,  Vj  is  the  rate  of  transcription  of  the 
plasmid,  while  v2  models  an  external  source  of  autoinducer. 
More  generally,  they  can  represent  stochastic  uncertainty  that 
may  be  inherent  in  the  model.  We  will  let  J\f  be  a  rectangular 
set  given  by  [0,  v,,max]  x  [0,  v2,max]. 

Regarding  the  rectangular  partition  of  the  state  space,  we 
consider  n{  =  n2  =  n3  —  3.  The  thresholds  9j,  j  =  1,2 
represent  the  values  of  x2  for  which  the  dynamics  are  changed 
due  to  different  activation  rates  (see  Figure  6),  while  j  =  0,  3 
represent  physical  lower  and  upper  bounds.  The  other  division 
points  0j,  i  =  1,3 ,  j  =  0,  1, 2,  3  were  chosen  so  that  the 


1 .  We  use  italics  (e.g.,  luxR )  to  indicate  the  genes  and  plain  font  to  denote  the 

protein  expressed  by  the  gene  (e.g.,  LuxR). 
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Fig.  7.  The  simple  graph  for  the  partitioning  in  eq.  (14). 


state  space  is  divided  into  regions  of  interest,  which  could  be 
thought  of  as  “small”,  “medium”,  and  “large”  with  respect 
to  the  corresponding  specie  concentrations.  The  numerical 
values  of  these  constants  are  given  by 


^  0 

II 

O 

O 

II 

qT 

01 
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01  = 
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e°  =  o 
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01 
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01 

=  50 

II 
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Vl  ,,nax  =  200 

V2,max  =  200 

(14) 

and  by  eq.  (4) 

a(91?2«)  =  (eq^\  0?~\  (9f3_1), 

b(q'qiq 3)  =  {Of ,  Of ,  Of’ ) ,  qu 2,3  e  {1,  2,  3}. 

Following  the  notation  introduced  in  Section  2,  the  system  can 
be  represented  as  the  simple  graph  in  Figure  7.  The  dynamics 
in  each  of  the  27  rectangles  Ri{a(q'qiq3) ,  bUnqiq3))  =  I{qiq2q3), 
with  <7!  2,3  e  {1,2,  3}  are  given  by 

x  =  /<*««>  (*)  +  Bv  (15) 


for  qi,  q2  =  1,  2,  3  and  /  =  0.2  (see  Figure  6).  The  dynamics 
are  everywhere  continuous,  and  therefore  the  vector  fields 
on  adjacent  rectangles  coincide  on  the  common  facet.  The 
significance  of  the  state  variables  and  parameters  is  given  as 
follows: 

X\  =  protein  LuxR  (ml 

x2  =  complex  C  (;«/~3); 

x3  =  autoinducer  Ai(ml~3); 

k{  =  binding  rate  constant  (30  Z3m_1i_1); 

k2  =  dissociation  rate  constant  (10  f_1); 

n  =  diffusion  constant  (10  t~l)\ 
b  =  degradation  constant  for  LuxR  (3i_1); 
p  =  formation  of  Ai  due  to  lux  gene  activity 
( 30ml~3rl ); 

q  =  formation  of  LuxR  due  to  lux  gene  activity 
(5  m/-3r‘); 

s  =  scaling  constant  (10  t~l). 

5.1.  Obtaining  The  Infeasible  Set  T  by  Using  MARP 

Since  the  vector  field  given  by  eqs.  (15)  and  (16)  with  v3  = 
v2  =  0  is  continuous,  we  can  simply  apply  Algorithm  1  to 


where 


/ 


071*72  <73)  _ 


k2x2  —  £1*1X3  —  bx  1  +  qr(qiqM3) 

£1*1*3  —  £2*2 

£2*2  —  £1*1*3  —  nx  3  +  pr(qiq2q3) 


determine  the  orientation  of  the  edges  of  the  graph  given  in 
Figure  7.  The  result  is  given  in  Figure  8. 

Assume  the  property  set  V  is  given  by  the  target  rectan¬ 
gle  (222).  This  requires  the  execution  of  the  repeat  loop  in 
Algorithm  2  four  times: 


.  n  :=  V  =  {(222)} 

•  =  {(222),  (223),  (322),  (212)} 

•  =  {(222),  (223),  (322),  (212),  (213),  (323),  (312)} 

•  =  {(222),  (223),  (322),  (212),  (213),  (323),  (312), 
(313)}. 
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Fig.  8.  The  oriented  graph  obtained  by  applying  Algorithm  1  to  the  simple  graph  in  Figure  7.  Note  that  V  =  {(222)}. 


The  infeasible  set  X  is  the  complement  of  77,  and  it  consists  of 
the  remaining  27  —  8  =  19  rectangles.  The  biological  signif¬ 
icance  of  this  result  is  that  luminescence  (which  is  described 
by  relatively  high  values  of  complex  x2  and  autoinducer  x3) 
can  only  be  achieved  if  the  initial  level  of  autoinducer  x3  is 
high. 

5.2.  Obtaining  A  Feasible  Set  T  Using  The  RRFT  Algorithm 
In  this  section,  we  consider  the  construction  of  feasible  set  T 
for  the  system  with  noise  v{t)  using  RRFT.  In  Figure  9,  sev¬ 
eral  sample  trajectories  illustrate  the  computation  of  candidate 
trajectories  corresponding  to  the  (111 )— ( 121)  transition,  and 
the  (121 )— ( 111)  transition.  Each  of  these  trajectories  is  the 
result  of  planting  a  tree  with  a  different  seed.  Thus,  the  RRFT 
algorithm  can  be  used  to  obtain  sample  trajectories  for  edges 
in  the  directed  graph  in  Figure  8. 

Recall  that  the  property  set  V  is  the  rectangle  (222).  We  first 
consider  the  rectangle  ( 1 1 1 )  to  determine  candidate  points  for 
the  feasible  set  T .  Figure  10  shows  the  forest  of  trees  where 
a  solution  trajectory  is  found.  Ten  initial  seeds  are  generated 
and  a  forest  starts  to  grow  until  a  solution  is  found.  The  so¬ 
lution  trajectory,  the  modes,  and  the  transitions  are  shown  in 
Figure  11.  Figure  12  shows  c(7j.)  and  g(Tsi)  for  the  trees. 
As  thresholds  for  coverage  criteria,  we  use  c  =  1  x  1 0  5, 
g  =  1  x  1 0  “6,  and  jl  =  0.1.  Four  new  seeds  are  generated 
beyond  the  initial  seeds  in  this  case.  Figure  13  shows  the  cov¬ 
erage  of  the  set  (111)  as  new  seeds  are  generated.  First,  ten 
initial  seeds  are  generated  using  a  Halton  sequence  and  the 
next  four  seeds  are  generated  by  the  grid-based  method. 

6.  Conclusion 

Hybrid  systems  are  widely  used  in  robotics  to  model  the  use 
of  specific  controllers  and  estimators  in  different  regimes, 
switches  based  on  contact  mechanics  of  rolling  or  sliding,  or 


to  take  into  account  the  interactions  and  messaging  in  a  mul¬ 
tirobot  team.  Hybrid  systems  also  arise  naturally  as  models 
of  genetic  and  metabolic  networks.  They  capture  the  switch¬ 
ing  behavior  that  is  observed  in  phenomena  such  as  tran¬ 
scription,  protein-protein  interactions,  and  cell  division  and 
growth,  and  also  provide  global  descriptions  of  biological  sys¬ 
tems  described  locally  around  operating  points.  More  impor¬ 
tantly,  as  shown  in  this  paper,  they  allow  local  approximations 
that  lend  themselves  to  symbolic  reasoning  and  more  efficient 
computation. 

In  this  paper,  we  develop  computationally  efficient  tech¬ 
niques  to  analyze  hybrid  models  of  biomolecular  networks  by 
exploiting  their  specific  structure.  We  have  defined  the  frame¬ 
work  of  multi-affine  rectangular  hybrid  systems,  where  the 
vector  fields  have  product  type  nonlinearities  to  capture  the 
dynamics  of  chemical  reactions  and  the  invariants  are  rect¬ 
angular,  because  different  behaviors  emerge  as  a  function  of 
different  ranges  of  concentrations  of  regulatory  species.  To 
prove  qualitative  properties  of  such  systems,  which  are  biolog¬ 
ically  significant,  we  developed  the  MARP  algorithm  and  the 
RRFT  algorithm.  The  MARP  algorithm  yields  conservative 
results  due  to  overapproximations  of  the  underlying  reach¬ 
able  sets.  In  contrast,  the  RRFT  algorithm,  which  is  based  on 
trajectories  generated  from  simulation,  always  underapprox¬ 
imates  the  reachable  set.  However,  both  algorithms  provide 
complementary  tools  for  analysis.  This  is  illustrated  by  a  case 
study,  the  phenomenon  of  luminescence  production  in  the  ma¬ 
rine  bacterium  V.fischeri.  While  a  low-dimensional  case  study 
was  deliberately  chosen  to  facilitate  graphical  illustration,  the 
techniques  here  are  applicable  to  very  high-dimensional  sys¬ 
tems  (Alur  et  al.  2002a).  Future  work  is  being  directed  towards 
developing  tools  forformal  analysis  of  larger  classes  of  hybrid 
systems,  which  could  capture  more  complicated  biochemical 
phenomena,  and  developing  control  laws  for  species  in  the 
network  that  can  be  directly  controlled  from  outside  the  cell. 
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Fig.  9.  Sample  trajectories  from  (111)  to  (121)  (left)  and  (121)  to  (111)  (right). 


100  20 


Fig.  10.  The  forest  of  trees  computed  by  Algorithm  3  with  initial  conditions  from  the  (111)  rectangle  with  the  goal  of  finding 
a  trajectory  that  reaches  (222).  The  forest  is  shown  on  the  left.  The  trajectory  found  by  the  algorithm  that  reaches  (222)  is 
highlighted  (shown  dark)  on  the  right. 
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Fig.  11.  A  close  up  of  the  trajectory  found  by  Algorithm  3,  shown  in  Figure  10,  showing  the  trajectory  in  different  modes: 
top  left,  overall  solution  trajectory;  top  right,  segment  from  (111)  to  (121)  (note  that  the  axis  has  been  rotated  to  facilitate 
visualization);  bottom  left,  ( 121)— ( 122);  bottom  right,  (122)-(222). 


0 1)0 
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Fig.  12.  Coverage  of  the  trees.  New  trees  are  started  at  1046,  1072,  1294,  and  1379  iterations  because  the  growth  rate  slows 
below  the  specified  threshold  (g  =  1  x  10  Four  new  trees  are  generated  until  solution  is  discovered. 
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Fig.  13.  The  coverage  improves  (/u  (S)  decreases)  as  new  trees  are  seeded. 
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